-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(chore): Refactor CatalogSource
controller
#30
(chore): Refactor CatalogSource
controller
#30
Conversation
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
710132b
to
5cb17af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round.
} | ||
|
||
func (r *CatalogSourceReconciler) reconcile(ctx context.Context, catalogSource *corev1beta1.CatalogSource) (ctrl.Result, error) { | ||
job, err := r.ensureUnpackJob(ctx, catalogSource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When I first create a CatalogSource, the reconciler creates the unpack job, creates the package/BM CRs
- Next reconciliation, unpack job is detected, so I get the unpack job back, but still proceed to check if the unpack job is completed + create the BM/package CRs again? Seems benign from "what the user sees" perspective, but it's just going to keep eating up unnecessary cpu cycles.
- After a few reconciliation loops, if the job's been GCed, then
ensureUnpackJob
will unpack the same index image again, and then proceed to reattempt to create the child resources again. Now this is eating up unnecessary network bandwidth AND cpu cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Next reconciliation, unpack job is detected, so I get the unpack job back, but still proceed to check if the unpack job is completed + create the BM/package CRs again? Seems benign from "what the user sees" perspective, but it's just going to keep eating up unnecessary cpu cycles.
- After a few reconciliation loops, if the job's been GCed, then ensureUnpackJob will unpack the same index image again, and then proceed to reattempt to create the child resources again. Now this is eating up unnecessary network bandwidth AND cpu cycles.
You are correct that this is the case and something we need to consider for the future. This PR doesn't actually change the original process that was followed. IMO making these changes are outside the scope of this PR and should be captured in another issue. I'm not sure off the top of my head what the solution would look like (maybe a check for ready status == true or a label/annotation before running the unpack logic?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @everettraven. Also, a good chunk of the brainstorming here is rendered moot by the likely switchover to an aggregated apiserver. We'll no longer be creating/updating/deleting CRs for the underlying catalog metadata.
A similar problem will exist though: Can we skip syncing the apiserver storage if it is already up-to-date. We should write an general issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the likely switchover to an aggregated apiserver. We'll no longer be creating/updating/deleting CRs for the underlying catalog metadata.
We won't be creating CRs, but we'll still be creating some objects? Eg I think the logic for creating the package manifests today are encoded in the packageserver but we should probably have the aggregated-apiservice only be responsible for administering the objects, instead of creating them, so the catalogd controller would still be creating them.
if jobCopy.Status.CompletionTime != nil { | ||
// Loop through the conditions and pull out all the conditions | ||
conds := map[batchv1.JobConditionType]batchv1.JobCondition{} | ||
for _, cond := range jobCopy.Status.Conditions { | ||
conds[cond.Type] = cond | ||
} | ||
|
||
// Check for signs of failure first. If any of the below | ||
// conditions have a status of True return an error | ||
failConds := []batchv1.JobConditionType{ | ||
batchv1.JobFailed, | ||
batchv1.JobFailureTarget, | ||
batchv1.JobSuspended, | ||
} | ||
for _, failCond := range failConds { | ||
if cond, ok := conds[failCond]; ok { | ||
if cond.Status == v1.ConditionTrue { | ||
return false, fmt.Errorf("unpack job has condition %q with a status of %q", failCond, v1.ConditionTrue) | ||
} | ||
} | ||
} | ||
|
||
// No failures so ensure the job has a completed status | ||
// TODO: This is probably redundant. Check to see if we can just return true if we made it here | ||
if cond, ok := conds[batchv1.JobComplete]; ok { | ||
if cond.Status == v1.ConditionTrue { | ||
return true, nil | ||
} | ||
} | ||
} | ||
|
||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if job.Status.CompletionTime != nil {
for _, cond := range jobCopy.Status.Conditions {
if cond.Status == v1.ConditionTrue{
if cond.Type == batchv1.JobComplete {
return true, nil
}else {
return false, fmt.Errorf("unpack job has condition %q with a status of %q", cond.Type, cond.Status)
}
}
}
}
return false, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong here, but my understanding was that there could be multiple statuses existing at the same time. For example a Job could have both the batchv1.JobFailed
and batchv1.JobComplete
statuses. That being said I do think this implementation could be improved to be something like:
if job.Status.CompletionTime != nil {
for _, cond := range jobCopy.Status.Conditions {
if cond.Status == v1.ConditionTrue && cond.Type != batchv1.JobComplete {
return false, nil
}
}
return true, nil
}
return false, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f5f99c5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, cond := range jobCopy.Status.Conditions {
if cond.Status == v1.ConditionTrue && cond.Type != batchv1.JobComplete {
return false, nil
}
}
gotta return the error instead of nil
return false, fmt.Errorf("unpack job has condition %q with a status of %q", cond.Type, cond.Status)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 - good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// updateStatusError will update the CatalogSource.Status.Conditions | ||
// to have the condition Type "Ready" with a Status of "False" and a Reason | ||
// of "UnpackError". This function is used to signal that a CatalogSource | ||
// is in an error state and that catalog contents are not available on cluster | ||
func updateStatusError(catalogSource *corev1beta1.CatalogSource, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the purpose of the function feels like it should be renamed to updateStatusUnpackError
instead of the generic updateStatusError
, which should instead have the signature of updateStatusError(reason Reason, catalogSource *corev1beta1.CatalogSource)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was implemented as part of #17. Since #6 targets better error handling and status writing I think this is a bit outside the scope of this PR and would be better addressed by the PR that handles fixing #6. That being said, if there are strong feelings that it should be done as part of this PR I am okay with that - I'm just trying to keep the scope/overlap as small as possible
func (r *CatalogSourceReconciler) buildBundleMetadata(ctx context.Context, declCfg *declcfg.DeclarativeConfig, catalogSource corev1beta1.CatalogSource) error { | ||
// createBundleMetadata will create a `BundleMetadata` resource for each | ||
// "olm.bundle" object that exists for the given catalog contents. Returns an | ||
// error if any are encountered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably not return an error in the first sign of trouble and abandon operation. Returning an aggregated list of errors is probably the way to go here. (this'll be useful when we talk about changing the child resources to something else too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree. IMO this is better left to be addressed by the fix to #6 but if there are strong feelings towards doing it in this PR then that is fine by me. I'm just trying to be conscious of keeping the scope/overlap as small as possible
// "olm.package" object that exists for the given catalog contents. | ||
// `Package.Spec.Channels` is populated by filtering all "olm.channel" objects | ||
// where the "packageName" == `Package.Name`. Returns an error if any are encountered. | ||
func (r *CatalogSourceReconciler) createPackages(ctx context.Context, declCfg *declcfg.DeclarativeConfig, catalogSource *corev1beta1.CatalogSource) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, regarding the error being returned in the first sign of trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #30 (comment)
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
CatalogSource
controller
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Description
Cleans up the controllers by:
Package
andBundleMetadata
controllersCatalogSource
controller to follow a similar approach to rukpak and operator-controller for consistencyreconcile()
) to help new contributors be able to more easily go through and understand what each function is doingMotivation
CatalogSource
controller #5